[[!meta title="Tagged snapshots of upstream APT repositories"]]

[[!toc levels=2]]

# Overview

Our tagged snapshots of upstream APT repositories are published on
<http://tagged.snapshots.deb.tails.boum.org/>.

These are _partial_, tagged snapshots of upstream APT repositories we
need, so that one can rebuild a released ISO in the future, and we
keep the corresponding source code around.

The main goal here is having reproducible builds some day, and to
comply with various licenses such as the GPL.

These snapshots are partial: in a given snapshot, we import only the
packages needed by a given build of Tails.

The corresponding data shall be backup'ed, and expired very
cautiously, if ever.

# Source code

* `tails::reprepro::snapshots::tagged` class in
  [[!tails_gitweb_repo puppet-tails]]
* bits scattered in the main Tails Git repository (details below)

# Design notes

## Listing needed packages

To generate partial APT repositories, we need to know what to include
in them. Therefore, we create a _build manifest_ at the end of an ISO
build. It is generated by
[[!tails_gitweb auto/scripts/generate-build-manifest]], thanks to
[[!tails_gitweb data/wrappers/apt-get]] and
[[!tails_gitweb data/debootstrap/scripts/jessie.patch]].

Output:

- for each APT repository we use time-based snapshots for: name, serial
- for each binary package: name, version, architecture
- for each source package: name, version

In passing, here are some nice side-effects of having this build
manifest:

- It allows to inspect the diff between the subset of two different
  snapshots that was used at build time; the benefit is quite small as
  long as we're based on Debian stable (we also fetch packages from
  testing, sid, backports, etc. though), but if/when we switch to
  being based on Debian testing, then we will definitely want that.
- Say a branch (topic one, or devel, etc.) introduces a regression,
  and has changes in the set of packages used at build time, we may
  want to check how exactly that set was changed. Think "check the
  diff between `.packages`" as we do at release time, but done in
  a more correct way.

## Importing packages into partial snapshots

### How it's done in practice

* [[!tails_gitweb auto/scripts/tag-apt-snapshots]]
* [tails-prepare-tagged-apt-snapshot-import](https://git-tails.immerda.ch/puppet-tails/tree/files/reprepro/snapshots/tagged/tails-prepare-tagged-apt-snapshot-import)
* [tails-publish-tagged-apt-snapshot](https://git-tails.immerda.ch/puppet-tails/tree/files/reprepro/snapshots/time_based/tails-publish-tagged-apt-snapshot)

### A corner case: APT pinning magics

If a (package, version) is seen at build time in 2 or more APT
sources, `tails-prepare-tagged-apt-snapshot-import` injects it
into each of the tagged snapshots corresponding to these sources.

The goal is to avoid this scenario, that could happen if we injected
each package _only_ into the distribution it was downloaded from:

 - version X of package P is available both in suite S1 on origin O1,
   and in suite S2 on origin O2
 - version Y of package P is available in suite S3 of origin O3
 - our pinning makes us prefer version X of package P *because it's
   available in O1/S1*; otherwise, if it wasn't in there, then our
   pinning would make APT prefer version Y to version X
 - at ISO build time, APT fetches package P version X from O2/S2
 - given this build manifest, we import package P version X into our
   tagged snapshot of O2/S2, but not into our tagged snapshot of O1/S1
 - if we rebuild from the same source tree using that set of tagged
   snapshots, then version Y of package P will be installed

This scenario can happen in practice:

	# cat /etc/apt/sources.list
	deb http://security.debian.org wheezy/updates main
	deb http://ftp.us.debian.org/debian/ wheezy main
	deb http://ftp.us.debian.org/debian/ jessie main

	# cat /etc/apt/preferences
	Package: *
	Pin: origin security.debian.org
	Pin-Priority: -10

	Package: *
	Pin: release o=Debian,n=wheezy
	Pin-Priority: 990

	Package: *
	Pin: release o=Debian,n=jessie
	Pin-Priority: 700

	# apt-cache madison a2ps
	a2ps | 1:4.14-1.3 | http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
	a2ps | 1:4.14-1.1+deb7u1 | http://security.debian.org/ wheezy/updates/main amd64 Packages
	a2ps | 1:4.14-1.1+deb7u1 | http://ftp.us.debian.org/debian/ wheezy/main amd64 Packages

	# apt-cache policy a2ps
	a2ps:
	  Installed: (none)
	  Candidate: 1:4.14-1.1+deb7u1
	  Version table:
	     1:4.14-1.3 0
	        700 http://ftp.us.debian.org/debian/ jessie/main amd64 Packages
	     1:4.14-1.1+deb7u1 0
	        -10 http://security.debian.org/ wheezy/updates/main amd64 Packages
	        990 http://ftp.us.debian.org/debian/ wheezy/main amd64 Packages

And then, APT will download `a2ps` from security.d.o:

	# apt-get download a2ps --print-uris
	'http://security.debian.org/pool/updates/main/a/a2ps/a2ps_4.14-1.1+deb7u1_amd64.deb' a2ps_4.14-1.1+deb7u1_amd64.deb 956298 sha256:e47d7fe9adb7aa62421108debf425830f4e2385e98151c5cb359d3eb8688eea8

... but if `a2ps` was not available in the regular Wheezy archive,
e.g. because we were using a tagged snapshot that imported `a2ps` into
the security archive, then APT would prefer `a2ps` from Jessie, which
demonstrates the problem.

## Valid-Until

A tagged APT repository snapshot that was used to build a given Tails
release is immutable by design, so it does not need the protections
provided by `Valid-Until`. Besides, not using `Valid-Until` for those
makes it much easier to reproduce a given ISO build in the future.

So, the `Release` files for tagged snapshots have no
`Valid-Until` field.

## Garbage collection

We want to keep "forever" the tagged snapshots used by Tails releases.

In practice, "forever" == min(3 years for GPL, how long we want to be
able to reproduce the build of a released ISO) = 3 years.

Depending on the growth rate of our tagged snapshots in practice, we
may or may not need to implement expiration of these snapshots any
time soon. Time will tell.
